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Abstract. We consider the enumeration problem of first-order queries over structures of 
bounded degree. Durand and Grandjean have shown that this problem is in Constant- 
DELAYiin. An enumeration problem belongs to CONSTANT-DELAY;i„ if for an input of 
size n it can be solved by (i) an 0(n) precomputation phase building an index structure, 
followed by (ii) a phase enumerating the answers with no repetition and a constant delay 
between two consecutive outputs. In this article we give a different proof of this result 
based on Caifman's locality theorem for first-order logic. Moreover, the constants we 
obtain yield a total evaluation time that is triply exponential in the size of the input 
formula, matching the complexity of the best known evaluation algorithms. 



1. Introduction. 

Model checking is the problem of testing whether a given sentence is true in a given model. 
It's a classical problem in many areas of computer science, in particular in verification. If 
the formula is no longer a sentence but has free variables then we are faced with the query 
evaluation problem. In this case the goal is to compute all the answers of a given query on 
a given database. 

As for model checking, query evaluation is a problem often requiring a time at least 
exponential in the size of the query. Even worse, the evaluation often requires a time of the 
form 'nP^^\ where n is the size of the database and k the size of the query. This is dramatic, 
even for small /c, when the database is huge. 
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However there are restrictions on the structures that make things easier. For instance 
MSO sentences can be tested in time Unear in n over structures of bounded tree-width [2] 
and MSO queries can be evaluated in time Unear in n + m, where m is the size of the output 
of the query (note that m could be exponential in the number of free variables of the query, 
and hence in k) [3]. 

In this paper we are concerned with first-order logic (FO) and structures of bounded 
degree. In this case the model checking problem for FO sentences is known to be linear 
in n |9j. Moreover, the constant factor is at most triply exponential in the size k of the 
formula [5]. This last algorithm easily extends to query evaluation obtaining an algorithm 
working in time f{k){n + m) where / is a triply exponential function. 

As we already mentioned, the size m of the output may be exponential in the arity of the 
formula and therefore may still be large. In many applications enumerating all the answers 
may already consume too many of the allowed resources. In this case it may be appropriate 
to first output a small subset of the answers and then, on demand, output a subsequent 
small number of answers and so on until all possible answers have been exhausted. To 
make this even more attractive it is preferable to be able to minimize the time necessary to 
output the first answers and, from a given set of answers, also minimize the time necessary 
to output the next set of answers - this second time interval is known as the delay. 

We say that a query can be evaluated in linear time and constant delay if there exists an 
algorithm consisting of a preprocessing phase taking time linear in n which is then followed 
by an output phase printing the answers one by one, with no repetition and with a constant 
delay between each output. Notice that if a linear time and constant delay algorithm exists 
then the time needed for the total query evaluation problem is bounded by f{k){n + m) 
for some function /. Hence this is indeed a restriction of the linear time query evaluation 
algorithms mentioned above. From the best of our knowledge it is not yet known whether a 
bound f{k){n + m) for some function / on a query evaluation problem implies the existence 
of a linear time and constant delay enumeration algorithm. We conjecture this is not the 
case. 

It was shown in [3j that linear time constant delay query evaluation algorithms could 
be obtained for FO queries over structures of bounded degree, hence improving the results 
of [9] and [5]. 

The proof of [3] is based on an intricate quantifier elimination method. In this paper 
we provide a different proof of this result based on Gaifman Locality of FO queries. Our 
algorithm can be seen as an extension of the algorithm of [5] to queries. However the index 
structure built during the preprocessing phase is more complicated than the one of [5] in 
order to obtain the constant delay enumeration. Moreover, our constant factor is triply 
exponential in the size of the formula, while it is not clear whether the constant factor 
obtained in [3] is elementary. Note that the triply exponential constant factor cannot be 
significantly improved: it is shown in [5] that a constant factor only doubly exponential 
in the size of the formula is not possible unless the parametrized complexity class AW[*] 
collapses to the parametrized class FPT. 

2. Definitions. 

2.1. Gaifman locality and first-order logic. A relational signature is a tuple a = 
{Ri, . . . , Ri), each Rt being a relation symbol of arity r^. A relational structure over a 
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is a tuple A = [A, R^, . . . , R-j^) , where A = {ai , . . . , Um} is the set of elements of A and 
Rf' is a subset of A^^ . We fix a reasonable encoding of structures by words over some finite 
alphabet. The size of A is denoted by ||^|| and is the length of the encoding of A. 

The Gaifman graph of a relational structure A, denoted by G{A), is defined as follows: 
the set of vertices of G{A) is A and there is an edge (a, b) in G{A) iff there exists a relation 
Ri and a tuple t ^ Ri such that both a and b occur in t. Given a, 6 G A, the distance 
between a and 6, denoted (5(o, 6), is the length of a shortest path between a and 6 in G[A) 
or oo if a and b are not connected. The distance between two tuples a = (ai, . . . , o^) and 
6 = . . . , 6i) of A, denoted (5(a, 6), is the min{5(aj, 6^) : 1 < i < /c, 1 < j < For a given 
r G N and a given tuple of elements a of some structure A^ we denote by Nj.{a) the set of 
all elements in A such that their distance from a is less or equal to r. The r -neighborhood 
of a, denoted as Afr{a), is the substructure of A induced by Nr{a) and expanded with one 
constant for each element of a. Given two tuples of elements a and b we say that they have 
the same r -neighborhood type, written J\fr{a) ~ Mr(b), if there is an isomorphism between 
Afr{a) and Afr(b). 

We consider first-order logic (FO) built from atomic formulas of the form x = y oi 
Ri{xi, . . . ,XrJ for some relation Ri and closed under the usual Boolean connectives (-■, V, A) 
and existential and universal quantifications (3, V). When writing (j){x) we always mean that 
X are exactly the free variables of (p. Given a structure A and a tuple a of elements of A, 
we write A \= (p{a) if the formula (j) is true in A after replacing its free variables with a. As 
usual \(p\ denotes the size of (p. 

We are now ready to state Gaifman locality for FO. 

Theorem 2.1 (Gaifman Locality Theorem [7j). For any first- order formula (f){x), for every 
structure A and tuples a, b, we have Mr{a) ~ Mr{b) implies A \= (/>(a) iff A\= 4>{b), where 
r = 2\'t'\. 

Given d € N, a structure is said to be d-degree-bounded, if the degree of the Gaif- 
man graph is bounded by d. The following nice algorithmic property of d-degree-bounded 
structures can be proved using Theorem 12. 1[ 

Theorem 2.2 ([9, 5j). Fix d € N. The problem of whether a given d- degree-bounded 

20(\4>\) 

structure A satisfies a given first- order sentence (p is decidable in time 2^ ||-^||- 



2.2. Model of computation and Constant-Delay^j^ class. We use Random Access 
Machines (RAM) with addition and uniform cost measure as a model of computation. For 
further details on this model and its use in logic see [3]. 

An enumeration problem is a binary relation. Given an enumeration problem R and 
an input x, a solution for x is a y such that {x, y) G R. An enumeration problem R 
induces a computational problem as follows: Given an input x, output all its solutions. An 
enumeration problem is in the class CONSTANT-DELAY/j„ if on input x it can be decomposed 
into two steps: 

• a precomputation phase that is performed in time 0(|x|), 

• an enumeration phase that outputs all the solutions for x with no repetition and a 
constant delay between two consecutive outputs. The enumeration phase has full 
access to the output of the precomputation phase but can use only a constant total 
amount of extra memory. 
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In particular if R is in Constant-Delay/^ then the enumeration problem R can be solved 
in time 0{\x\ + \{y : R{x,y)}\). From the best of our knowledge it is not known whether 
the converse is true or not. We conjecture that it is not. More details about Constant- 
DELAY/j„ can be found in [3]. 

We are interested in the following enumeration problem for <j){x) S FO and d E N: 

Enum^((/<) = {(x, y) : x is a, d-degree-bounded structure A,y is a tuple a of elements of A 

and A \= (/>(a)} 

We further denote by (j){A) the set {a : A \= </>(«)} and by the cardinality of 

this set. We show that Enumd{(p) is in CONSTANT-DELAY;j„. 

Theorem 2.3 ([3j). There is an algorithm that for all d G N, all (f) FO and all d- 
degree-hounded structures A enumerates 4'{A) with a precomputation phase taking time 

20(101) 

2^ • 1 1 and a delay during the enumeration phase that is triply exponential in |(/)|. 
In particular, for all d €z N and all (p G FO the enumeration problem Enumii{(p) is in 
CONSTANT-DELAY/j„. Moreover, if the domain of A is linearly ordered, the algorithm 
enumerates (f){A) in increasing order relative to the induced lexicographical order on tuples. 

Hence the total query evaluation induced by the enumeration procedure of Theorem 12. 31 

is in time 2^ (||^|| + thus matching the model checking complexity of Theo- 

rem 12. 2[ Our proof of Theorem 12.31 is based on Gaifman Locality Theorem while the proof 
of [3] uses a quantifier elimination procedure (see also [8j for a similar argument). Note that 
it is not clear from the proof of [3] that their algorithm is triply exponential in the size of 
the formula. 



3. FO QUERY EVALUATION. 

In this section we assume d € N to be fixed and all our structures are d-degree bounded. 

A formula (^{x) with k free variables x = xi . . . x/^ is said to be connected around xi if 
(pix) logically implies that X2, . . . ,Xk are in the (r/c)-neighborhood of xi for r = 2^^^^. 

Let Trk be the set of all isomorphism types of (r/c)-neighborhoods of single elements, i.e. 
the isomorphism types of structures of the form 7Vrfc(o) for some element a of some structure 
A. By {rk)-neighborhood-type of an element a we mean the isomorphism type of its (rk)- 
neighborhood. Because our structures are d-degree-bounded each (rA;)-neighborhood has at 
most d^'' elements. For each r € Trk we denote by ij,t{x) the fact that the (rA:)-neighborhood- 
type of X is r. For each type in Trk we fix a representative for the corresponding (rk)- 
neighborhood and fix a linear order among its elements. This way, we can speak of the 
first, second,. . . , element of an (rA;)-neighborhood. For technical reasons, we actually fix 
a linear order for each /-neighborhood for / < rk such that (i) it is compatible with the 
distance from the center of the neighborhood: the center is first, then come all the elements 
at distance 1, then all elements at distance 2 and so on. . . and (ii) the order of a (/ -|- l)-type 
is consistent with the order on the induced /-type. 

For some sequence F = {a2, ■ ■ ■ , am} of (m — 1) elements from [1, . . . , d^^], we write 
X = F{xi) for the fact that, for j G {2,...,m}, xj is the Oj-th element of the (rk)- 
neighborhood of xi. Let T^ be the set of all possible such F. Let J>fc = Ui<m<fc-^Wc- 

For a given x = xi . . . Xk a r-partition of x is a set of pairs {(Ci,Fi), . . . , {Cm,Fm)} 
such that / Q C X, Ui<i<m = {xi, . . . , x^}, C, nCj = $ for i / j, and Fi € jJJ''. For 
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a given r-partition C of x and {Ci,Fi) G C we write to represent variables from d, x\ 
to represent the first variable from Cj, X2 to represent second variable and so on. 

For a given r-partition C = {{Ci, Fi), . . . , [Cm, Fm)} of x by Div^{x) we mean a 
conjunction of formulas saying that Nr{x^) r]Nr{x^) = for all 1 < i 7^ j < m and formulas 
A(c, Fi)ec^^ ~ ^ii^i)- Note that the latter part implies that x* is connected around x\. 

The following is an immediate consequence of Theorem 12. li 



Lemma 3.1. Fix a structure A. Then any formula (j){x) with k free variables is equivalent 
over A to a formula of the form 



V 

ceCr(x) 



Div^{x)A V f\ 

(Tl,...,T|c|)GSci<|C| 



(3.1) 



where r = 21*^1, Cr{x) is the set of all r-partitions of x, and Sc ^ (T^fc)'*^' is finite. 



Proof. Let 0(x) be a formula with k free variables and r = 21*^1. As in the statement of this 
lemma, we denote by Cr{x) the set of all partitions C = {(Ci, Fi), . . . , (C^, -^m)} of x with 

— yX^, . . . , J. 

By taking all possible r-partitions over x we see that (/>(x) is equivalent to: 

V (Div^(x)A,/.(x)) 

C&Cr{x) 

Let a be a tuple of A such that A \= (t>{a). Thus for exactly one C G Cr{x), A \= 
Div^{a) A </>(a). As Div^ induces that variables from each Ci for some {Ci,Fi) G C are 
connected, the r-neighborhood of each a* is completely included into the (rA;)-neighborhood 
of a\. Let m = \C\. For 1 < i < m, let Tj be the r/c-neighborhood-type of a\. We now take 
Sc as the set of all such tuples (ti, . . . , r^) for all tuples a such that A \= Div^ [a) /\(j){a). By 
construction we have 4'{x) implies ()3.ip . The reverse inclusion is an immediate consequence 
of Gaifman Locality Theorem: When Div^(a) holds, Mr{a^) is induced by Mrk{o\) = Ti 
and Fi. Moreover, J\fr{a) is the disjoint union oi Mr{a^) and is therefore induced by C. □ 

We are now ready to prove Theorem 12.31 

Proof of Theorem \2.3[ Fix a formula (p{x) with k free variables. Let ^ be a structure. Let 
r = 21*^1. By Lemma |3.H (/'(x) is equivalent over ^ to a formula of the form given by (j3.ip . 
We assume that A comes with a linear order over its elements. If not, we use the linear 
order induced by the encoding of A. 

Intuitively the precomputation phase determines the disjunction given by ()3.ip and 
precomputes the (rA;)-neighborhoods of each element of A. The fact that this can be done 
in time linear in ||^|| and triply exponential in |(/>| will make use of Theorem 12.21 

In a first step, for each i < rk we precompute the pairs of nodes at distance i. In other 
words, for each a in A, we compute the set of elements b such that S{a,b) = i. This can 
easily be done in time linear in r/c • [ |^| | by induction on i: during the base case we compute 
the Gaifman graph of A and then we perform the classical computation of the transitive 
closure of this graph up to depth rk. 

In a second step, the precomputation phase computes for each element a of A its (rk)- 
neighborhood: for each element a of A, we compute its (rA;)-neighborhood-type and for all 
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i < (F^ a pointer from a to the i-th element of its (rA;)-neighborhood. We use an induction 
on the radius of the neighborhood to achieve this goal within the desired time constraints. 

As 0-neighborhoods all share the same isomorphism type and have just one pointer to 
their centers, the induction base is obvious. So let's assume that in linear time in the size 
of A we have computed all /-neighborhoods for all nodes. With one more linear pass we 
now compute the {I + l)-neighborhoods. Fix a (z A. From the first step, we have all the 
elements of A at distance / + 1 from a. As we already have computed the /-neighborhood, 
it remains to try all possible orders among those elements and test isomorphism with the 
ordered types we have initially fixed. 

There are at most d}^^ nodes at distance I + 1 and I < rk. Hence the number of 
orders we need to test is bounded by (d^'^)!. Once the order is fixed we try all possible 
(r/c)-neighborhood-types that we have initially fixed (there are \Trk\ possibilities) and then 
test that the two orders induce an isomorphism (each test simply requires going through all 
tuples of the neighborhood). Let s(r, fc, d) be the maximal size of a (rfc)-neighborhood. Thus 
this step is altogether achieved in time 0{{d^^)\ • \Trk\ ■ s{r, k, d)) which is triply exponential 
in |(/>| because r = 2l<^l, \Trk\ = 0(2^(^''='°')) and s{r,k,d) = ©(d^^l'^l). 

During the third step of the precomputation we determine the (rfc)-neighborhood-types 
that are relevant for (j) over A. Fix a r-partition C = {{Ci,Fi), . . . , (Cm, Fm)} of Cr{x) and a 

sequence ri, . . . , € Trk- This sequence is relevant for C if A \= 3x Div^(j;) A f\j /ir^ {xD 

4>{x). Notice that the tests of the form ij,t-.{x\) have been precomputed during the second 
step and can therefore now be treated as unary symbols. Similarly the tests Div^(a;) can 
be expressed using the graph computed during the first phase. Altogether, the first and 



A 



second phase has replaced 
we can apply Theorem 12.2 



with a formula of size linear in k. Hence 



Bivl^ix) Af\^fir,{xi) 

in order to test whether the sequence is relevant for C in time 
linear in \\A\\ and triply exponential in the size of the formula. We do this for all possible 

C, investigating at most (|7^fc|) = 2^ cases. The number of possible C is the number 
of possible splits of k variables into disjoint and nonempty subsets multiplied by {\J'rk\)^, 

which altogether is again 2 . For each C we store a list of all sequences relevant for 
it. We call a r-partition C relevant if that list is nonempty. 

The last step of the precomputation phase orders, for each r G Trk, the elements of A 
having that particular (r/c)-neighborhood-type and stores a pointer from one element to the 
next one according to the linear order on the elements of A. To do that, we just need to 
enumerate through all the elements in A, in the order provided by the linear order on its 
elements, and, using information obtained in the second step, add each of them to a proper 
list. In order to do this we need to be able to sort a set of elements in linear time and this 
can be done in our RAM model as explained in 

Altogether we have a precomputation phase of the desired properties: it works in time 
linear in |^| and triply exponential in |0|. We now turn to the enumeration phase. 

Fix relevant r-partition C = {(Ci, Fi), . . . , {Cm, Fm)} in Cr{x). We show how to enu- 
merate in lexicographical order, with no repetition, constant memory and constant delay, 
all the tuples a such that A,d \= Div^(x) A Vi f\j fJ-nj (xi)- The result will then follow from 
the following simple lemma, whose proof consist in merging two ordered lists. 

Lemma 3.2 ([Ij). If there is a linear order < such that R,R' are in Constant-Delay^j^ 
and both output their answers in increasing order relative to <, then R U R' is also in 
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CONSTANT-DELAY;j„ and the answers can be enumerated in increasing order relative to <. 

□ 

The proof is by induction on the number m of classes in the r-partition C. The base 
case being a particular case of the inductive step, we only do the inductive step. 

Without loss of generality we assume that the most significant variable of x is in the 
first variable of x^, that the most significant variable of x \ is the first variable of and 
so on. We simultaneously do the following for each sequence ti, . . . ,Tm relevant for C and 
use Lemma 13.21 to avoid duplicate answers. 

Fix Ti, . . . , Tm relevant for C. Using the precomputed pointers we can enumerate one by 
one all elements ai of A whose (rA;)-neighborhood-type is ri. For each such element let = 
Fi(ai) and we enumerate, by induction, the solutions for Tp = DivJ? (x) A \/i /\j fJ-njix^i), 
where C is C with (Ci,Fi) removed. For each solution b obtained by induction, we check 
whether Nrk{cLi) intersects with Nr(b) or not (recall that this information has been precom- 
puted during the first phase and therefore requires only constant time). If it does not, we 
have a solution a^,b for cj) because of (j3.ip . If it does then we move to the next solution 
to ■0. Notice that the size of Nrk{ai) is bounded by d^'^ hence the length of false hits is 
bounded by d^^ . As we consider only relevant sequences of pairs, for each d^ we are certain 
to find at least one matching b that gives us a solution d^,b to cp. Altogether we get the 
desired constant delay for the enumeration process. 

The enumeration phase needs to process all possible r-partitions C and all relevant 
sequences of 7^^, i.e. a number of cases triply exponential in Note that each such choice 
yields disjoint solution sets and can therefore be considered sequentially. Altogether this 
yields a procedure linear in the size of the output and triply exponential in □ 

4. Conclusion 

We have given a new proof of the linear time and constant delay enumeration problem of 
first-order queries over structures of bounded degree. Our procedure is based on Gaifman's 
locality theorem for first-order logic and our constants are triply exponential in the size of 
the query, and therefore induces the known complexity of the associated model checking 
problem. 
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